Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available October 1, 2026
-
Free, publicly-accessible full text available July 14, 2026
-
Free, publicly-accessible full text available August 28, 2026
-
Scientific discovery is a complex cognitive process that has driven human knowledge and technological progress for centuries. While artificial intelligence (AI) has made significant advances in automating aspects of scientific reasoning, simulation, and experimentation, we still lack integrated AI systems capable of performing autonomous long-term scientific research and discovery. This paper examines the current state of AI for scientific discovery, highlighting recent progress in large language models and other AI techniques applied to scientific tasks. We then outline key challenges and promising research directions toward developing more comprehensive AI systems for scientific discovery, including the need for science-focused AI agents, improved benchmarks and evaluation metrics, multimodal scientific representations, and unified frameworks combining reasoning, theorem proving, and data-driven modeling. Addressing these challenges could lead to transformative AI tools to accelerate progress across disciplines towards scientific discovery.more » « lessFree, publicly-accessible full text available April 11, 2026
-
Free, publicly-accessible full text available April 24, 2026
-
Feature transformation aims to reconstruct the feature space of raw features to enhance the performance of downstream models. However, the exponential growth in the combinations of features and operations poses a challenge, making it difficult for existing methods to efficiently explore a wide space. Additionally, their optimization is solely driven by the accuracy of downstream models in specific domains, neglecting the acquisition of general feature knowledge. To fill this research gap, we propose an evolutionary LLM framework for automated feature transformation. This framework consists of two parts: 1) constructing a multi-population database through an RL data collector while utilizing evolutionary algorithm strategies for database maintenance, and 2) utilizing the ability of Large Language Model (LLM) in sequence understanding, we employ few-shot prompts to guide LLM in generating superior samples based on feature transformation sequence distinction. Leveraging the multi-population database initially provides a wide search scope to discover excellent populations. Through culling and evolution, high-quality populations are given greater opportunities, thereby furthering the pursuit of optimal individuals. By integrating LLMs with evolutionary algorithms, we achieve efficient exploration within a vast space, while harnessing feature knowledge to propel optimization, thus realizing a more adaptable search paradigm. Finally, we empirically demonstrate the effectiveness and generality of our proposed method.more » « lessFree, publicly-accessible full text available April 11, 2026
-
Free, publicly-accessible full text available February 7, 2026
-
Cation exchange membranes (CEMs) are widely used in many applications. The fixed anionic groups e.g., COO , –SO3 - , etc. in the polymer matrix ideally allows the passage only of oppositely charged cations, driven by a potential or a concentration gradient. Anions, charged negative, the same as the membrane matrix, cannot pass through the membrane due to electrostatic repulsion. Such “Donnan-forbidden” passage can, however, occur to some degree, if the electrical or concentration gradient is high enough to overcome the “Donnan barrier”. Except for salt uptake/transport in concentrated salt solutions, the factors that govern such Forbidden Ion Transport (FIT) have rarely been studied. In most applications of transmembrane ion transport, whether electrically driven as in electrodialysis, or concentration-driven, it is the transport of the counterion to the fixed charged groups, such as that of the proton through a CEM, that is usually of interest. Nevertheless, CEMs are also of interest in analytical chemistry, specifically in suppressed ion chromatography. As used in membrane suppressors, both transport of permitted ions and rejection of forbidden ions are important. If the latter is indeed governed by electrostatic factors, other things being equal, the primary governing factor should be the charge density of the membrane, tantamount to its ion exchange capacity (IEC). In fabricating microscale suppressors, we found useful to synthesize a new ion exchange polymer that can be easily molded to make tubular microconduits. Despite a high IEC of this material, FIT was also found to be surprisingly high. We measured several relevant properties for thirteen commercial and four custom-made membranes to discover that while FIT is indeed linearly related to 1/ IEC for a significant number of these membranes, for very high water-content membranes, FIT may be overwhelmingly governed by the water content of the membrane. In addition, FIT through all CEMs differ greatly among strong acids, they may still be transported as the molecular acids and the extent is in the same order as the expected activity of the molecular acid in the CEM. These results are discussed with the perspective that even for strong acids, the transport does take place as un-ionized molecular acids.more » « less
-
Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the Classification and Regression Trees (CART) algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS adapts to additive structure while remaining highly interpretable. Experiments on real-world datasets show FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding decision-making. Specifically, we introduce a variant of FIGS known as Group Probability-Weighted Tree Sums (G-FIGS) that accounts for heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. Theoretically, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that tree-sum models leverage disentanglement to generalize more efficiently than single tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS performs competitively with random forests and XGBoost on real-world datasets.more » « lessFree, publicly-accessible full text available February 18, 2026
An official website of the United States government
